Languages with mismatches

نویسندگان

  • Chiara Epifanio
  • Alessandra Gabriele
  • Filippo Mignosi
  • Antonio Restivo
  • Marinella Sciortino
چکیده

In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r . The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S| · logk+1 |S|) and answering queries in time O(|x | + |occ(x)|) for any word x , where occ is the list of all occurrences of x in S up to errors. c © 2007 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessing the Quality of Persian Translation of Orwell’s Nineteen Eighty-Four Based on House’s Model: Overt-Covert Translation Distinction

This study aimed to assess the quality of Persian translation of Orwell's (1949) Nineteen Eighty-Four by Balooch (2004) based on House's (1997) model of translation quality assessment. To do so, 23 pages (about 10 percent) of the source text were randomly selected. The profile of the source text register was produced and the genre was realized. The source text profile was compared to t...

متن کامل

Assessing the Quality of Persian Translation of Orwell’s Nineteen Eighty-Four Based on House’s Model: Overt-Covert Translation Distinction

This study aimed to assess the quality of Persian translation of Orwell's (1949) Nineteen Eighty-Four by Balooch (2004) based on House's (1997) model of translation quality assessment. To do so, 23 pages (about 10 percent) of the source text were randomly selected. The profile of the source text register was produced and the genre was realized. The source text profile was compared to t...

متن کامل

Valency mismatches and the coding of reciprocity in Australian languages

Reciprocals are characterized by a crossover of thematic roles within a single clause. Their peculiar semantics often creates special argument configurations not found in other clause types. While some languages either encode reciprocals by clearly divalent, transitive clauses, or clearly monovalent, intransitive clauses, others adopt a more ambivalent solution. We develop a typology of valency...

متن کامل

The Effect of L1 Persian on the Acquisition of English L2 Orthographic System on the Shared Grounds

This paper elaborates on Persian and English orthographic shared aspects to study the effects of L1 Persian on learning English as a foreign language. While there are some examples of letter and sound mismatches in the orthographic system of both languages, those of English are more complex than Persian. In order to see the effect of the mismatch between orthography and transcription, 40 Persia...

متن کامل

Mismatches and Divergences: the Continuum Perspective

In this paper, we address the issue of resolving divergences (such as he swam across the river translates into French as il a traversé la rivière à la nage) and mismatches (such as fish translates into Spanish as pez and pescado) in a uniform way. First, we present empirical evidence that only a continuum perspective on divergences and mismatches can help translate them in different languages. ...

متن کامل

How to Overcome Translation Mismatches - An Inference Driven Mapping between Meaning Representations

This paper deals with issues that a bidirectional GermanRussian machine translation system faces when the meaning of spatial prepositions in these languages does not line up. A uniform representation language is used to define the meaning of spatial prepositions in a language independent way. This formal language makes it possible to compare monolingual meaning representations and allows for th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 385  شماره 

صفحات  -

تاریخ انتشار 2007